NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Omega-Regular Decision Processes

Hahn, E M; Perez, M; Schewe, S; Somenzi, F; Trivedi, A; Wojtczak, D (April 2024, The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24))

Regular decision processes (RDPs) are a subclass of non- Markovian decision processes where the transition and reward functions are guarded by some regular property of the past (a lookback). While RDPs enable intuitive and succinct rep- resentation of non-Markovian decision processes, their ex- pressive power coincides with finite-state Markov decision processes (MDPs). We introduce omega-regular decision pro- cesses (ODPs) where the non-Markovian aspect of the transi- tion and reward functions are extended to an ω-regular looka- head over the system evolution. Semantically, these looka- heads can be considered as promises made by the decision maker or the learning agent about her future behavior. In par- ticular, we assume that if the promised lookaheads are not fulfilled, then the decision maker receives a payoff of ⊥ (the least desirable payoff), overriding any rewards collected by the decision maker. We enable optimization and learning for ODPs under the discounted-reward objective by reducing them to lexicographic optimization and learning over finite MDPs. We present experimental results demonstrating the effectiveness of the proposed reduction.
more » « less
Full Text Available
Omega-Regular Reward Machines

Hahn, E M; Perez, M; Schewe, S; Somenzi, F; Trivedi, A; Wojtczak, D (October 2023, ECAI 2023 (Open Access by IOS Press))

Reinforcement learning (RL) is a powerful approach for training agents to perform tasks, but designing an appropriate re- ward mechanism is critical to its success. However, in many cases, the complexity of the learning objectives goes beyond the capabili- ties of the Markovian assumption, necessitating a more sophisticated reward mechanism. Reward machines and ω-regular languages are two formalisms used to express non-Markovian rewards for quantita- tive and qualitative objectives, respectively. This paper introduces ω- regular reward machines, which integrate reward machines with ω- regular languages to enable an expressive and effective reward mech- anism for RL. We present a model-free RL algorithm to compute ε-optimal strategies against ω-regular reward machines and evaluate the effectiveness of the proposed algorithm through experiments.
more » « less
Full Text Available
Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning

https://doi.org/10.1007/978-3-030-45190-5_17

Hahn, E. M.; Perez, Mateo; Schewe, Sven; Somenzi, Fabio; Trivedi, Ashutosh; and Wojtczak, Dominik (April 2020, Tools and Algorithms for the Construction and Analysis of Systems)
Biere, Armin; Parker, David (Ed.)
We characterize the class of nondeterministic 𝜔-automata that can be used for the analysis of finite Markov decision processes (MDPs). We call these automata ‘good-for-MDPs’ (GFM). We show that GFM automata are closed under classic simulation as well as under more powerful simulation relations that leverage properties of optimal control strategies for MDPs. This closure enables us to exploit state-space reduction techniques, such as those based on direct and delayed simulation, that guarantee simulation equivalence. We demonstrate the promise of GFM automata by defining a new class of automata with favorable properties—they are Büchi automata with low branching degree obtained through a simple construction—and show that going beyond limit-deterministic automata may significantly benefit reinforcement learning.
more » « less
Full Text Available

Search for: All records